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Abstract 

This article resolves a longstanding question in the axiomatisation of entropy as pro- 
posed by Shannon and highlighted in renewed concerns expressed by Jaynes. We introduce 
a companion measure of a probability distribution that we suggest be called the extropy of 
the distribution. The entropy and the extropy of an event distribution are identical. How- 
ever, this identical measure bifurcates into distinct measures for any quantity that is not 
merely an event indicator. As for entropy, the maximum extropy distribution is also the 
uniform distribution. We display several theoretical and geometrical properties of the pro- 
posed extropy measure, discussing in detail the difference between its assessment of a refined 
probability distribution and the axiom that characterises the Shannon entropy in this regard. 
This is what resolves the concerns of Shannon and Jaynes. In a discrete context, the extropy 
measure is approximated by a variant of Gini's index of heterogeneity when the maximum 
probability mass is small. This is related to the "repeat rate" of a mass function as studied 
by Turing and Good. The continuous analogue of extropy turns out to equal the negative 
integral of the square of the density function. We conclude with a consideration of a rescaled 
measure of extropy which identifies it as the dual of entropy. The structure of the duality 
suggests a general theory of complementary distributions. 

Key Words: entropy; extropy; Gini index of heterogeneity; repeat rate; 
duality; proper scoring rules 

1 Motivation, scope, and background 

The entropy measure of a probability distribution has had myriad useful applications in in- 
formation sciences since its full-blown introduction in the extensive article of Shannon (1948). 
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Prefigured by its usage in thermodynamics by Boltzmann and Gibbs, entropy has subsequently 
bloomed as a showpiece in theories of communication, coding, probability and statistics. So 
widespread is its application and advocacy, it is with due respect that we propose this measure 
has a natural complement which merits recognition and comparison, perhaps in many realms of 
its current application ... the measure of extropy. 

In this article we display several intriguing properties of this information measure which 
identify extropy as both a complement and a dual to entropy. Not only does its recognition 
resolve a fundamental question that has surrounded Shannon's measure since its very inception, 
but it also provides links to other notable information measures whose relation to entropy have 
not been recognised. Our presentation of the properties of extropy is meant to air the results for 
readers relatively familiar with the Shannon entropy theory. We shall follow his notation and 
extend it. Discussion of our specific applied motivation for generating these results, and their 
relation to the theory of proper scoring rules would be distracting as an introduction. 

Suppose X is a quantity with a finite discrete realm of possibilities {x±, X2, xn}- If a prob- 
ability distribution over the partition of events (X = x\), (X = X2), (X = xn) is composed 
of the vector probability masses = (pi,P2, ■ ■■,Pn), the Shannon entropy measure denoted by 
H{X) or H(pn) equals — Y^f=iPi l°g(Pi)- The complementary measure we propose as extropy, 
and denote here by J(X) or J(pjy), equals — J2iL± (1 — Pi) log(l — pi). As is entropy, extropy 
is interpreted as a measure of the amount of uncertainty represented by the distribution for 
X. The results that conclude this note will suggest a different location and scaling for extropy 
inhering in the alternative measure J*(pn) = (N — J(ptv) + log(N — 1). With this 
scaling, the extropy measure can be recognised formally as the dual of entropy. Moreover, its 
function value for a mass function equals the entropy of a complementary distribution which 
shall be identified in our Section 6. 

Shannon's (1948) original and fairly exhaustive investigation of entropy characterised it as 
the unique (up to location and scale transformations) measure H(-) of a mass function pjy over 
a partition of events that satisfies three properties: 

i. ) H(pi,p2, ■■■iPn) is continuous in each of its arguments; 

ii. ) H(jj, -fi, jj) is a monotonic increasing function of the partition size, N; and 

iii. ) H(tp,(l-t)p,(l-p)) = H(p,l-p) +pH(t,l-t). 

The article of Renyi (1961) presented alternative characterisations of entropy due to Fadeev 
and himself. These involved alternating these axioms with various properties of the Shannon 
measure, such as its invariance with respect to permutations of its arguments, and its achieved 
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maximum occurring at the uniform distribution. 

Shannon's third axiom implies that the entropy in a joint distribution for two quantities 
equals the entropy in the marginal distribution for one of them plus the expectation for the 
entropy in the conditional distribution for the second given the first: 



Indeed the appeal of this result was a motivation favouring Shannon's choice of his axiom iii. 

In his original extensive article, Shannon (p. 50 in the 1949 reprint) slighted his own char- 
acterisation theorem for entropy, noting that it was in no way necessary for the larger theory of 
communication he was developing. He viewed it merely as lending plausibility to some subse- 
quent definitions. He considered the real justification of the three axioms of entropy as residing 
in their desirable useful implications. In particular, the implication that the joint entropy in two 
quantities equals the entropy in one plus the expected conditional entropy in the other given 
that one (our equation [T]) was regarded as welcome substantiation for entropy as a reasonable 
measure of information. 

The thoughtful discussion of Jaynes (2003, Section 11.3), who was a major contributor to the 
understanding of entropy and its importance, explicitly recognised the discussable open status 
of entropy's third characterising axiom. Having developed his discussion of probability some 
350 pages without requiring it, he highlighted it as "really an additional assumption which we 
should have included in our list." He followed this statement with an "Exercise 11.1" which 
concludes with the injunction to "Carry out some new research in this field by investigating this 
matter; try either to find a possible form of the new functional equations, or to explain why this 
cannot be done." 

As we read him, Jaynes clearly expected that a satisfactory motivation for the special status 
of entropy as a measure of information would be found, thinking that his "exercise" would be 
resolved with a solution explaining "why this cannot be done" . In a direct sense, our construction 
and analysis of the extropy measure shows the exercise to be solved rather by the exhibition 
of a "new functional equation", providing an alternative to Shannon's third axiom. Jaynes' 
expectations regarding this matter led him, we believe, to one of his rare overstatements of the 
status of entropy as a unique measure of information. He wrote (2003, p. 350) "We have shown 
that use of the measure (Shannon entropy) is a necessary condition for consistency" , and further 
conjectured "that any other choice of 'information measure' will lead to inconsistencies if carried 
far enough" . We only remark that to be precise, what was shown is that Shannon's definition of 
entropy is necessary for consistency with the third proposed axiom. Concerns with a foundational 




(1) 



3 



establishment of the uniqueness of entropy were also aired by Kolmogorov (1956, p. 105). 

Despite his expectations, Jaynes was not convinced that an adequate foundation for the 
uniqueness claims of entropy as an information measure had been found. He concluded that 
long section of his book by writing (p. 351) "Although the above demonstration appears satisfac- 
tory mathematically, it is not yet in completely satisfactory form conceptually. The functional 
equation (Shannon's third axiom) does not seem quite so intuitively compelling as our previous 
ones did. In this case, the trouble is probably that we have not yet learned how to verbalize the 
argument leading to (axiom hi) in a fully convincing manner. Perhaps this will inspire others 
to try their hand at improving the verbiage that we used just before writing (axiom hi)." We 
believe that a person of Jaynes' imagination and insight would have been pleased with our sur- 
prising resolution of his dissatisfaction. In tandem with Shannon's entropy measure denoted by 
H(-), we appropriately denote our extropy measure by J( ). 

2 The characterisation of extropy 

Context: Suppose that the possible values of an unknown but observable quantity X are the 
numbers in the realm R(X) = {x\, x%, x^}. The vector p^r = (j?l,P2, ■■■,Pn) denotes an 
associated probability mass function asserted for X over the event partition {(X = xi), (X = 
X2), (X = xn)}- We recall 
Definition 1: The entropy in X or in p^r equals 

N 

H(X) = H(p N ) = -Y,Pi l °9(Pi) > (2) 

i=i 

and we introduce 

Definition 2: The extropy in X or in p^r equals 

J(X) = J( PN ) = - J2(l-Pi)log(l- Pi ). (3) 

i=l 

Result 1: If N = 2, so X is merely an event, then H(X) = J(X), 

for H(p 2 ) = Pi log(pi) + (1 -pi) log(l - p x ) = J(p 2 ). 

However, H(pn) > J(pat) as long as pjv contains three or more non-zero components. 

When N = 3 it is no longer necessary that H{pz) equals J{ps). In fact, these will be equal 
only for mass functions P3 that have one component equal to 0. (In this case the distribution 
has only two non-zero components which sum to 1, just as a distribution when N = 2. By 
convention, pi log(pi) = when pi = 0, preserving continuity.) Figure [T] displays the range 
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Entropy H(p 2 ) Entropy H(p 3 ) Entropy H(p 4 ) 

1i ■ 1 1i ■ 1 1i ■ 




1 2 1 2 1 2 

Entropy H(p ) Entropy H(p 6 ) Entropy H(p ? ) 

Figure 1: The range of (entropy, extropy) pairs (-ff('), J(')) corresponding to all distributions 
within the unit-simplices of dimension 1 through 6. The realms of the quantities they assess 
have sizes N = 2 through 7. 

of possibilities for the (entropy, extropy) pairs for probability mass functions within the unit- 
simplices of Dimension 1 through 6. This range expands regularly as the size of 9?(X) increases 
from 2 to 7; for a unit-simplex of Dimension K contains the unit-simplex of dimension (K — 1) 
as one of its "faces" . (The added dimension of possibility may be assessed with probability zero, 
so in this case the associated distributions would have the same range for H and J as the lower 
dimension.) An algebraic proof that iT(pjv) > J(pn) is submitted as an appendix. 

Notice particularly that the range of possible (entropy, extropy) pairs is not convex. As 
viewed across the six examples shown in Figure [TJ the range exhibits convex scallops along its 
upper boundary: there are (N-2) scallops and one flat edge along its upper boundary for the 
unit-simplex of dimension (N-l). The flat edge as the northwest boundary is the line defined by 
H(p, l—p) = J(p, 1—p), running in the southwest to northeast direction from (0,0) to (-log(.5), 
-log(.5)). The lower boundary of the range of pairs is a single concave scallop, ruling its own 
interior out of the range. 

Result 2. J(X) satisfies axioms Shannon's axioms i and ii. 
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The function J(') is evidently continuous in its arguments, and 

J(ii..i) = -N(l-±)log(l-±) = -(N-l){log(N-l)-log(N)} 

is a monotonic increasing function of N. 

As to other touted properties of entropy, extropy shares many of them. For example the 
extropy measure is obviously permutation invariant. Moreover, for any size of N, the maximum 
extropy distribution is the uniform distribution. We prove this as follows. Let L(ptv,A) be the 
Lagrangian expression for the extropy of p^r subject to the constraint J2 Pi = 1- Then 

N N 

L( PN ,\) = - J2(l-Pi)log(l- Pi ) + A (1 - J2 Pi) ( 4 ) 

i=l i=l 

with N partial derivatives of the form = log (1 — pi) + 1 — A. Setting each of these 

equal to yields N equations of the form A = 1 + log (1 — pi). These N equations, together with 
§y = 0, ensure that all the pi are equal, and thus they must each equal 1/N. The bordered 
Hessian determinants for the matrix of cross-partial derivatives alternate in sign, assuring that 
L(- ,• ) achieves a maximum at this solution. 

The scale of the maximum entropy measure is unbounded as N increases, since H(jj, , jj) = 
log(N). In contrast, the scale of the maximum extropy is bounded by 1, for J(jj, jf, jf) = 
— (N — 1) log{(N — 1)/N}. The limit of 1 can be determined by observing that 

lim -(N-l)log(^-±) = Jim -l og (l - _L)"-1 = -log{e- 1 ) = 1. 

N^roo iV N^oo iv 

It is interesting now to examine precisely why extropy does not satisfy Shannon's third axiom 
for entropy, and how it does behave with respect to measuring the refinement of a probability 
distribution. 

Result 3. J(tp, (1 — t)p, 1 — p) = J(p,l—p) + A(p,t) , where 

A(p,t) = (1 - p)log(l - p) - {1 - tp)log(l - tp) - {1 - (1 - t)p}log{l - (1 - t)p} . 

This result follows easily from the definition of J(ps)- It is most easily interpreted visually. 
The right panel of Figure [2] displays the extropy J(p, 1 — p) along with the difference between 
the extropies J(tp, (1 — t)p, 1 — p) and J(p, 1 — p) according to Result 3, while the left panel 
displays the difference between the entropies H(tp, (1 — t)p, 1 — p) and H (p, 1 — p) according to 
Shannon's axiom iii. Each difference is shown as a function of p G [0, 1] for the four values of 
t = .1, .2, .3, and .5. Either difference function for any value of t would be the same as the 
function for the value (1 — t). 
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According to Shannon's axiom iii, the entropy for the refined distribution (tp, (1 — t)p, 1 — p) 
increases linearly with p at the rate of the entropy in the refining split factor, H(t, 1 — t). In 
contrast, the extropy of the refined distribution increases at an increasing rate as a function 
of p. For small values of p, the extropy of the refined distributions increase more slowly with 
p than does entropy, while for large values of p it increases more quickly. As p increases to 
1, the increases in the entropy and extropy of the refined distribution become equal, for any 
t G [0, 1]. This is a result of the fact that when p = 1, the refined distribution is virtually a 
binary distribution (t, 1 — t, 0), for which entropy and extropy are equal. The distribution that 
is being refined would then be a degenerate distribution representing certainty. 

As a gauge of the increase in uncertainty provided when a distribution is refined, this non- 
linear feature of the extropy measure is appealing. Refining a larger probability with a splitting 
factor of size t may well be considered to increase the amount of information at a greater rate 
than refining a smaller probability by this same factor. This is a natural feature of the extropy 
measure. Replacing Shannon's axiom iii with our Result 3 would complete the characterisation 
of extropy. 

Result 4. H( PN ) + J(p N ) = Eili Hfal-pi) = ££i J{ Pi ,l - Pi ). 

Thus, J(p N ) = Ya=i Hfal-pi) - H( PN ). 
Symmetrically, H(p N ) = J^iLi J(PiA~Pi) ~ J(pn)- 

The basic equation for the sum of H(~) and «/(•) derives from simply summing the two com- 
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ponents of each of the H(pi) = Pilog{pi) + (1 — pi)log{l — pi). The resulting equation displays 
that the extropy of a distribution equals the difference between the sum of the entropies in the 
crudest partitions defined by the possible values of X, that is (X = Xi) and {X ^ X{), and the 
extropy in the finest partition they define: (X = x\), (X = X2), ■■■ and (X = xn). Moreover, 
since the entropy of any event equals the extropy of the event, this relation is symmetric in the 
functions H(') and </('). It is apparent that the symmetric relation between entropy and extropy 
is fundamentally related to the refinement characteristics inherent in their third axioms. It also 
suggests a sense in which the extropy measure is a dual of entropy. This idea will be explored 
further in Section 5. 

3 Isoentropy and Isoextropy contours in the unit-simplex 

For the display that follows, we suppose that a quantity X has realm 3?(X) = {1,2,3}, and 
that these possibilities are assessed with a probability mass function P3 in the unit-simplex S 2 . 
Figure |3^eft displays some contours of constant entropy distributions in the 2-Dimensional unit- 
simplex (N = 3), to compare with some contours of constant extropy distributions in Figure 
[flight. These contours exhibit a geometrical sense in which the extropy and entropy measures 
of a distribution are complementary. Whereas entropy contours sharpen into the vertices of the 
simplex and flatten along the faces, the extropy contours sharpen into the midpoints of the faces 
and flatten into the vertices. 




(1,0,0) (0,0,1) (1,0,0) (0,0,1) 

Figure 3: At left are contours of equal entropy distributions within the 2-D unit-simplex. At 
right are contours of equal extropy distributions. 
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4 The extropy measure of a continuous distribution 

Devising the extropy measure of a continuous distribution admitting a density function yields 
a pleasant surprise. Shannon (1948, Section III. 20) already proposed that the entropy measure 
— J2Pi l°9{Pi) nas a continuous analogue in the measure — / f(x)logf(x)dx when the distribution 
function admits a continuous density. Kolmogorov (1956) concurred, with slight qualifying 
reservations. However, he also voiced concerns about the primacy of the entropy measure (p. 
105), insisting that further analysis was needed. The analogical character of Shannon's entropy 
measure for a continuous density derives from its status as the limit of a linear transformation 
of the discrete entropy measure. 

4.1 Shannon's continuous entropy, — / f(x) logf(x) dx 

For the following simple exposition of Shannon's considerations, presume again that the realm 
of a quantity X is {x±, xjv}, and that the values of x\ and xn are fixed. For each larger value 
of N, presume that more elements are included uniformly in the interval between them. Define 
Ax = (xjv — X\)/(N — 1) for any specific N, and define f(xi) = pi/Ax. In these terms, the 
entropy H(pw) can be expressed as 



Thus, —Y!,f(%i)log{f(xi)}Ax is merely a location transform of the entropy — ^2Pilog{pi), 
shifting only by the size of log{ Ax} which is finite for any N. The limit of the relocated entropy 
expression suggests the continuous analogue as — / f(x) log f(x) dx. 

Shannon noted that this analagous measure loses the absolute meaning that the finite mea- 
sure enjoys, because its value must be considered relative to an assumed standard of the coordi- 
nate system in which the value of the variable is expressed. (If the variable X were transformed 
into Y, then the continuous measure of the entropy is adjusted by the Jacobian of the transfor- 
mation.) However, the continuous analogue retains its value as a comparative measure of the 
uncertainties contained in two densities because they would both be affected by the transforma- 
tion in the same way. 

4.2 Motivating the continuous extropy measure as — / f 2 (x) dx 

At first sight, the extropy measure — — pi)log(l — pi) appears problematic: if each p, L were 
simply replaced by a density value f(x), the measure would not be defined when f{x) > 1, 



^Pilog(pi) 



fi x i) Ax log{f(xi) Ax} 
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which it may. However, the situation clarifies by expanding (1 — pi) log(l — Pi) through three 
terms of its Maclaurin series with remainder: (1 — pi) log(l — pi) = — pi +pf/2 + rf/6 for 
some Ti E (0,Pi). Summing these expansion terms over i = 1, N shows that when the realm 

N 

of possibilities for X increases (as a result of larger N) in such a way that Ax — > and max pi 
decreases toward 0, the extropy measure becomes closely approximated by 1 — \ Y^=\Vi ■ 

Following the same tack as for entropy in representing pi by f(xi)Ax suggests that for large 
TV the analogue continuous extropy measure can be approximated by 

i-^X>? = i~£/ 2 (^)(a*) 2 

i=i 

= l-^EA^A*- (6) 

This approximation is merely a location and scale transformation of the term — ^f 2 (xi)Ax. 
Thus, the limiting measure of extropy for a continuous density is well regarded as — / f 2 (x)dx. 

The sum of the squares of probability masses (as well as the integral of the square of a 
density) has received attention for more than a century for a variety of reasons, but never in 
a direct relation to the entropy of a distribution. Rather, it has merely been considered an 
alternative measure of uncertainty. Good (1979) referred to this measure as the "repeat rate" 
of a distribution, developing an original idea of Turing. Gini (1912, 1939) had earlier proposed 
this measure as an "index of heterogeneity" of a discrete distribution, via 1 — Y^f=iPi- We now 
find that in a discrete context, a rescaling of Gini's index is an approximation to the extropy 
of a distribution when the maximum probability mass is small. In a continuous context, the 
negative expected value of a density function value is the analogue of the extropy measure of a 
distribution that we are proposing. 

5 Rescaling extropy: a theory of complementary distributions 

Let us return to the definition of extropy for a discrete probability mass function over a finite 
partition. We have noted in Section 2 how the scaling of our entropy measure H( ) is different 
from that of our extropy measure </(•): the range of the former is unbounded as TV increases, 
while that of the latter is bounded by 1. 

Suppose we redefine extropy as extropy* according to a location and scale transformation: 

J*( PN ) = (iV-l)- 1 ^) + log(N-l) 

N -I , 

1=1 
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The second line follows from the transformation definition in the first line by simple algebra. It 
portrays an intriguing result. 

Result 5. The extropy* of a distribution with mass function p^v equals the entropy of a 
complementary distribution with mass function q^v whose components are defined relative to 
those of pat by qi = (1 — pi)/ (N — 1) for each i = 1, N. The complementary mass function 
qAr plays the role of a photographic negative with respect to the direct mass function p^r. 

When N = 2, this general definition of a complementary distribution reduces to the stan- 
dard definition of the distribution for an event complementary to E, E = 1 — E. However, 
the definition generalises the concept of complementary distributions beyond events to general 
quantities. The content of this complementary distribution (defined only formally in terms of 
the transformation of the original p.m.f.) might be thought of as the distribution of "unlike- 
liness" as opposed to the distribution of "likeliness" . The fact that an extropy* equals some 
complementary entropy underlies the intimate dual relationship between extropy and entropy. 

The mapping of a probability mass function p^r to its complement q^r is a contraction 
mapping. Every mass function in a unit-simplex is mapped onto a complementary function 
lying within an inscribed simplex of the same dimension. In turn, this complementary mass 
function has its own complementary distribution within a simplex inscribed in that one. The 
fixed-point theorem for contraction mappings assures that the uniform distribution in the centre 
of the unit-simplex is the unique mass function whose complementary mass function is itself. 
Figure [4] displays the way this contraction works in two dimensions for mass functions p3 . 

As a numerical and geometrical example, consider again Figure 3 in the context of the 
following numerical computations. Notice to begin that H(^, ^, j) = 1.0397 and J{\, §, \) = 
.7781 are identifiable as points on specific entropy and extropy contours. The extropy measure 
would be rescaled as extropy* via J*(|, |, ■£) = \ J(\, |, |) + log{2) = 1.0822 according to 
(7). Notice also that J*{\, \) also equals -ff(§, \, §) according to Result 5. Now viewing again 
Figure 3Right, notice firstly that the isoextropy contour including J(|, ^, 4) = -7781 is precisely 
the flipped image of the isoentropy contour including H(j,^,j) = 1.0397 which appears in 
Figure 3Left. Both of these contours lie within and tangent to the triangle S c if it would be 
inscribed within the unit-simpleces S 2 of Figures 3Left and 3Right. Now the numerical value 
of J*(t, I, \) equals the entropy value of a complementary mass function, lying on the contour 
including -ff(§, \, |) = 1.0822. This contour lies within and tangent to the subsimplex Sec if h 
would be inscribed within S c in Figure 3Left. This visualisation completes the understanding 
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(0,1,0) 




(1,0,0) (0,0,1) 



Figure 4: The complementary distribution mapping contracts the unit-simplex S into the in- 
scribed simplex S c , which it contracts in turn into the inscribed S cc , and then into S ccc , and so 
on. 

of extropy as the dual of entropy: their iso-function contours are flips of one another; and their 
function values are related through the complementary contraction mapping. Notice that the 
symmetrical equations relating J(-) to H(-) in Result 4 hold for </*(•) and H(-) as well. 

6 Concluding Discussion 

What's in a name? We are aware of prior uses of the word "extropy", documented in both 
the Online Oxford English Dictionary and in Wikipedia. In one usage it seems to have arisen 
as a metaphorical term rather than a technical term, naming a proposed primal generative 
natural force that stimulates order in both physical and informational systems rather than 
disorder. In the other, within a technical context, "extropy" also has had some parlance using 
it equivalently to the more commonly used "negentropy" . Neither type of usage appears to be 
very common. While we are not stuck on this particular word, the information measure we 
have introduced in this article seems aptly to merit the coinage of "extropy" . Whereas entropy 
is recognised as the expected log probability of the occurring value of X (a measure which 
could be considered "interior" to the observation X), our proposed extropy is derived from the 
sum of log non-occurrence probabilities less the expected log non-occurrence probability. This 
could be considered to be a measure exterior to the observation X. The exterior measure of all 
the non-occurring quantity possibilities is complementary to the entropy measure of the unique 
occurring possibility. Aside from our interest in a discussion of the propriety of this name, we 
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are much more interested in the information-technical community's response to the content of 
the results we have presented. 

It may be recognised that the assessment of entropies is fundamentally related to the theory 
of proper scoring rules for alternative forecast distributions. See, for examples, Lad (1996, ch. 
6) and Gneiting and Raftery (2008). The log probability for the observed outcome of X is a 
proper scoring rule for forecasting mass functions with its own touted unique characteristics. 
The expectation of this log score is the negentropy in the distribution. Our recognition that 
the any extropy* is also the entropy of a complementary distribution raises questions about 
the uniqueness characteristics of the log scoring rule. A detailed discussion in the context of a 
statistical application is in preparation. 

Given the broad range of applications of entropy over the past half-century, we suspect that 
the awareness of extropy as a complementary dual measure to entropy will raise as many new 
interesting questions as it answers. 
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A Appendix 

Entropy > Extropy 

Let X be a random quantity with a finite discrete realm of possibilities {x\,X2, ■ ■ ■ , Xjv} with 

probability masses pi, with pi = P(X = Xi), i = 1, . . . , N. We recall that 

JV 

H{X) = -^PilogijPi) 

i=i 

and 

N 

J(X) = -^2(l- Pi )log(l- Pi ). 
i=i 

We introduce the following real functions defined on [0, 1] which will be useful later: 

• s (p) = —plog(p) i with s(0) = ; 

• t(p) = s(l -p) = -(1 - p)log{\ - p) , with = ; 

• u(p) = s(p) — t(p) = —plog(p) + (1 — p)log(l — p) , with -u(O) = u(l) = . 
The function u(p) satisfies the following properties (see Figure [5]): 

1. u{p) = iff [p = 0, or p = 1 or p = k]- 

2. u(p) > iff <p < \. 

3. u(p) < iff \ < p < 1. 

4. u(l — p) = —u(p), for all p £ [0, 1]. 

5. is strictly concave in [0, |], that is, for any given pair (^1,^2) with < pi < p2 € (0, 
and for any given a S (0, 1), we have 

u(api + (1 - a)p 2 ) > au(pi) + (1 - a)u(p2) ■ 
By exploiting the function u(p), it is evident that 

N 

H(X)-J(X) = J2<Pi)- 

i=l 

This difference is permutation invariant with respect to the components pi . 

We observe that for any N > 1, if there exist i G {1,2, . . . ,N} such that pi = 0, then by 
considering an arbitrary quantity Y with a realm of cardinality N — 1 and probability masses 
(pi,P2, ■ ■ ■ ,Pi-\,Pi + i, . ■ ■ ,pn) we are ensured that 

H{X) = H(Y) and J(X) = J(Y). 
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Figure 5: The function u(p) = —plog(p) + (1 — p) log(l — p). 
We have the following result. 

Theorem 1. Let X be a finite random quantity, with realm {xi,X2, ■ • • , £jv}, and probability 
masses (pi,P2, ■ ■ ■ ,Pn) such that pi > 0, for all i = 1, 2, . . . , N, we have 

a) H(X) = J(X) if N < 2; 

b) H(X) > J(X) if N > 2. 

Proof. Case a). If N = 1 we trivially have H{X) = J(X) = and, if N = 2 it is H{X) = 
J{X) = —xlog{x) — (1 — x)log{l — x). 

Case b). We distinguish two alternatives: bl) pi < |, i = 1, 2, . . . , N; and b2) p% > \ for only 
one index i. 

Case bl). By the hypotheses, for each i, < pi < \ and J2iLiPi = 1- It follows from Properties 
1 and 2 of the function u(p) that 

N 

H(X)-J(X) = J2u( Pi )>0. 

i=i 

Case b2). To begin, suppose that N = 3. Without loss of generality we can assume p% > \, 
because of the permutation invariance of u('); consequently < p\ +P2 < \- Now from Property 
4 we deduce 

U (P3) = ~u(l ~ p 3 ) = ~u(pi + p 2 ). 

Then statement H(X) — J(X) = u(pi) + u(p2) — u(pi + p 2 ) > amounts to u(p\) + u{p2) > 
u(p\ +P2)- Since u(p) is strictly concave over the interval [0, ^] (see Property 5) and -u(O) = 
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we have 

»(pO = «(sfe° + sfe(pi+»))>sfe«( ) + sfe«(»+») = 

(8) 

= p^u(Pi+Pa) 

and 

> j^«(0) + «^«(P1 + = j^«(Pl +P2). ( 9 ) 

From Q and (|9j) it follows u(pi) + u(p 2 ) > u(pi +P2) and then il(X) - J(X) > 0. 

Generally, let N > 2. Again without loss of generality we can assume pn > \- We have 
u(pn) = ~u(l ~Pn) = ~u(pi +P2 + ■■ -+PN-i)- 
For each i = 1, . . . , N — 1, it is easy to see that 

«(?() > pl+P2+ p :. +PJV _^ (pi+p 2 + . . . +PJV-0, ( 10 ) 

because of the concavity of it(') . 
Finally, we have 

N N-l 

H(X) - J(X) = Y,u(Pi) = Yl u( Pi ) - u( Pl + p 2 + ■ ■ ■ + PN-l) >0. 
i=l i=l 

□ 
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